# EECS 370 Direct Mapped Caches



#### Announcements

- My office hours on Wed 11/8 are cancelled
- Extending my OH on Thursday (see calendar)
- Lab 9 due W
  - Lab 10 meets Fr/M
- Full P3 due Thu 11/9



# Agenda

- Fully-Associative vs Direct Mapped Cache
- Example
- Class Problems



#### What about stores?

- Where should you write the result of a store?
  - If that memory location is in the cache:
    - Send it to the cache.
    - Should we also send it to memory? (write-through policy)
  - If it is not in the cache:
    - Allocate the line (put it in the cache)?
       (allocate-on-write policy)
    - Write it directly to memory without allocation? (no allocate-on-write policy)



# Agenda

- Larger Cache Blocks
- Extra Problems
- LRU with More than Two Blocks
- Write-Through Cache
- Write-Back Cache



#### write-through, allocate on write (REF 3)



#### write-through, allocate on write (REF 3)



#### write-through, allocate on write (REF 4)



#### write-through, allocate on write (REF 4)



#### write-through, allocate on write (REF 6)



#### write-through, allocate on write (REF 6)



#### How many memory references?

- Each miss reads a block
  - 2 bytes in this cache
- Each store writes a byte
- Total reads: 8 bytes
- Total writes: 2 bytes
- but caches generally miss < 20%
  - Can we take advantage of that?
  - Multi-core processors have limited bandwidth between caches and memory
  - Extra stores also cost power



# Agenda

- Larger Cache Blocks
- Extra Problems
- LRU with More than Two Blocks
- Write-Through Cache
- Write-Back Cache



## Write-through vs write-back

- Can we design the cache to NOT write all stores to memory immediately?
  - Keep the most recent copy in the cache and update the memory only when that data is evicted from the cache (write-back)
  - Do we need to write-back all evicted lines?
    - No, only blocks that have been modified
    - Keep a "dirty bit", reset when the line is allocated, set when the block is stored into. If a block is "dirty" when evicted, write its data back into memory.



# Handling stores (write-back)

| Processor                                                                                | Cache                                               | Memory                                                                                                                 |
|------------------------------------------------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| Ld R1 ← M[ 1 ] Ld R2 ← M[ 7 ] St R2 → M[ 0 ] St R1 → M[ 5 ] Ld R2 ← M[ 10 ]  R0 R1 R2 R2 | V d tag data  O O O O O O O O O O O O O O O O O O O | 0 78<br>1 29<br>2 120<br>3 123<br>4 71<br>5 150<br>6 162<br>7 173<br>8 18<br>9 21<br>10 33<br>11 28<br>12 19<br>13 200 |
| R3                                                                                       | Hits: 0                                             | 14 <b>210</b><br>15 <b>225</b>                                                                                         |

#### write-back (REF 3)



#### write-back (REF 3)



#### write-back (REF 4)



#### write-back (REF 4)



#### write-back (REF 5)



#### write-back (REF 5)



#### write-back (REF 5)



## How many memory references?

- Each miss reads a block
  - 2 bytes in this cache
- Each evicted dirty cache line writes a block
- Total reads: 8 bytes
- Total writes: 4 bytes (after final eviction)

For this example, would you choose write-back or write-through?

Write-back works best when we write to a particular address multiple times before evicting



#### Review: Writes

| Store w No Allocate | Write-Back                                                               | Write-Through                                                            |
|---------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------|
| Hit?                | Write Cache                                                              | Write to Cache + Memory                                                  |
| Miss?               | Write to Memory                                                          | Write to Memory                                                          |
| Replace block?      | If evicted block is dirty, write to Memory                               | Do Nothing                                                               |
| Store w Allocate    | Write-Back                                                               | Write-Through                                                            |
| Hit?                | Write Cache                                                              | Write to Cache + Memory                                                  |
| Miss?               | Read from Memory to<br>Cache,<br>Allocate to LRU block<br>Write to Cache | Read from Memory to Cache, Allocate to LRU block Write to Cache + Memory |
| Replace block?      | If evicted block is dirty, write to Memory                               | Do Nothing                                                               |



# Agenda

- Fully-associative vs Direct Mapped Cache
- Example
- Class Problems







#### Fully-associative caches

- We designed a fully-associative cache
  - Any memory location can be copied to any cache line.
  - We check every cache tag to determine whether the data is in the cache.
- This approach can be too slow sometimes
  - Parallel tag searches are expensive and can be slow



#### Direct mapped caches

- We can redesign the cache to eliminate the requirement for parallel tag lookups
  - Direct mapped caches partition memory into as many regions as there are cache lines
  - Each memory region maps to a single cache line in which data can be placed
  - Now only one tag needs to be checked the one associated with the region the reference is in



# Mapping memory to cache (Direct-mapped)



#### Mapping memory to cache (Direct-mapped)



#### Direct mapped caches

- Two blocks in memory that map to the same index in the cache cannot be present in the cache at the same time
  - One index → one entry
- Can lead to 0% hit rate if more than one block accessed in an interleaved manner map to the same index
  - Assume addresses A and B have the same index bits but different tag bits
  - A, B, A, B, A, B, A, B, ...
  - All accesses are conflict misses



# Agenda

- Fully-associative vs Direct Mapped Cache
- Example
- Class Problems



#### **Direct-mapped cache**



Live Poll + Q&A: slido.com #eecs370

#### **Direct-mapped cache**

<u>Poll:</u> How many bits for each field?

tag line index block offset.

16 -> 4 bits. > 2 bytes.



Live Poll + Q&A: slido.com #eecs370

#### Direct-mapped (REF 1)



#### Direct-mapped (REF 1)



#### **Direct-mapped (REF 2)**



### Direct-mapped (REF 2)

<u>Poll:</u> Complete the last few instructions yourself



https://bit.ly/3tnI0nS

### **Direct-mapped (REF 3)**



#### **Direct-mapped (REF 3)**



#### **Direct-mapped (REF 4)**



#### **Direct-mapped (REF 4)**



#### **Direct-mapped (REF 4)**



#### **Direct-mapped (REF 5)**



#### **Direct-mapped (REF 5)**



#### Next time

- Discuss intermediate between fully-associative and direct mapped caches
  - "Set Associative" caches



## Agenda

- Fully-associative vs Direct Mapped Cache
- Example
- Class Problems



# offset: log, bp = b Class Problem—Storage overhead

brede: 64 byte





 Consider the following cache: 32-bit memory addresses, byte addressable, 64KB cache 64B cache block size, write-allocate, write-back, fully associative

This cache will need 512 kilobits for the data area (64 kilobytes times 8 bits per byte). Note that in this context, 1 kilobyte = 1024 bytes (NOT 1000 bytes!) Besides the actual cached data, this cache will need other storage. Consider tags, valid bits, dirty bits, bits to keep track of LRU, and anything else that you think is necessary.

How many additional bits (not counting the data) will be needed to implement this

cache?





## Class Problem—Storage overhead

• Consider the following cache:

32-bit memory addresses, byte addressable, 64KB cache
64B cache block size, write-allocate, write-back, fully associative

• Consider the following cache:

2 bytes

3 bytes

4 bytes

5 bytes

5 bytes

6 b

This cache will need 512 kilobits for the data area (64 kilobytes times 8 bits per byte). Note that in this context, 1 kilobyte = 1024 bytes (NOT 1000 bytes!) Besides the actual cached data, this cache will need other storage. Consider tags, valid bits, dirty bits, bits to keep track of LRU, and anything else that you think is necessary.

How many additional bits (not counting the data) will be needed to implement this cache?

## Class Problem—Storage overhead

Consider the following cache:
 32-bit memory addresses, byte addressable, 64KB cache
 64B cache block size, write-allocate, write-back, fully associative

This cache will need 512 kilobits for the data area (64 kilobytes times 8 bits per byte). Note that in this context, 1 kilobyte = 1024 bytes (NOT 1000 bytes!) Besides the actual cached data, this cache will need other storage. Consider tags, valid bits, dirty bits, bits to keep track of LRU, and anything else that you think is necessary.

 How many additional bits (not counting the data) will be needed to implement this cache?

```
Tag bits = 32 - log(64) = 26 bits
#lines = 64KB/64B = 1024
LRU = log(1024) = 10 bits
1 valid bit, 1 dirty bit
```



## Class Problem—Analyze performance

 Suppose that accessing a cache takes 10ns while accessing main memory in case of cache-miss takes 100ns. What is the average memory access time if the cache hit rate is 97%?

• To improve performance, the cache size is increased. It is determined that this will increase the <a href="https://district.nih.google.com">hit rate by 1%</a>, but it will also increase the time for accessing the cache by 2ns. Will this improve the overall average memory access time?

```
For store. - if hit: change data in Cache.

if miss: load from memory to Cache then change the data
in Cache.

Class Problem—Analyze performance

For Load. - if hit: load from Coche to register

if miss: load from memory to Ooche then Cache to register.
```

Poll

 Suppose that accessing a cache takes 10ns while accessing main memory in case of cache-miss takes 100ns. What is the average memory access time if the cache hit rate is 97%? Lit: 0.97 miss : 0.03

```
0.97x10ns + 0.03x10ns + 0.03 x100 ns
= 9.7 \text{ ns} + 0.3 \text{ ns} + 3 \text{ ns} = 13 \text{ ns}
```

 To improve performance, the cache size is increased. It is determined that this will increase the hit rate by 1%, but it will also increase the time for accessing the cache by 2ns. Will this improve the overall average memory access time?

```
12ns + 0.02 x (00n5
=12n5+2n5
= 14ns.
```

## Class Problem—Analyze performance

 Suppose that accessing a cache takes 10ns while accessing main memory in case of cache-miss takes 100ns. What is the average memory access time if the cache hit rate is 97%?

$$AMAT = 10 + (1 - 0.97)*100 = 13 \text{ ns}$$

• To improve performance, the cache size is increased. It is determined that this will increase the hit rate by 1%, but it will also increase the time for accessing the cache by 2ns. Will this improve the overall average memory access time?

$$AMAT = 12 + (1 - 0.98)*100 = 14 \text{ ns}$$

